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Abstract 



This paper studies the problem of testing if an input (F, o), where F is a finite set of unknown size and o is a 
binary operation over F given as an oracle, is close to a specified class of groups. Friedl et al. [Efficient testing of 
groups, STOC'05] have constructed an efficient tester using poly(log |F| ) queries for the class of abelian groups. 
^ I We focus in this paper on subclasses of abelian groups, and show that these problems are much harder: ri(|F|^/^) 

queries are necessary to test if the input is close to a cyclic group, and JldFj'^) queries for some constant c are 
necessary to test more generally if the input is close to an abelian group generated by k elements, for any fixed 

■ integer fc > 1. We also show that knowledge of the size of the ground set F helps only for fc = 1, in which 
I case we construct an efficient tester using poly(log |F|) queries; for any other value fc > 2 the query complexity 

■ remains fidFj'^). All our upper and lower bounds hold for both the edit distance and the Hamming distance. 
OO ! These are, to the best of our knowledge, the first nontrivial lower bounds for such group-theoretic problems in 

the property testing model and, in particular, they imply the first exponential separations between the classical 
ly^ ' and quantum query complexities of testing closeness to classes of groups. 

o ■ 

1 Introduction 

■ Background: Property testing is concerned with the task of deciding whether an object given as an oracle has 
. (or is close to having) some expected property. Many properties including algebraic function properties, graph 



properties, computational geometry properties and regular languages have been proved to be efficiently testable. 
We refer to, for example, Refs. ||8] [15] [17] for surveys on property testing. In this paper, we focus on property 
testing of group-theoretic properties. An example is testing whether a function / : G — )■ if, where H and G are 
groups, is a homomorphism. It is well known that such a test can be done efficiently fl1l5l [T8l . 

Another kind of group-theoretic problems deals with the case where the input consists of both a finite set F 
and a binary operation o : F x F — )• F over it given as an oracle. An algorithm testing associativity of the oracle 
in time 0(|Fp) has been constructed by Rajagopalan and Schulman [16|, improving the straightforward 0(|Fp)- 
time algorithm. They also showed that ri(|rp) queries are necessary for this task. Ergiin et al. [9| have proposed 
an algorithm using 0(|r|) queries testing if o is close to associative, and an algorithm using 0(|rp/^) queries 
testing if (F, o) is close to being both associative and cancellative (i.e., close to the operation of a group). They also 
showed how these results can be used to check whether the input (F, o) is close to an abelian group with 0(|F|^/^) 
queries. The notion of closeness discussed in Ergiin et al.'s work refer to the Hamming distance of multiplication 
tables, i.e., the number of entries in the multiplication table of (F, o) that have to be modified to obtain a binary 
operation satisfying the prescribed property. 



Friedl et al. |[T0]| have shown that, when considering closeness with respect to the edit distance of multiplication 
tables instead of the Hamming distance (i.e., by allowing deletion and insertion of rows and columns), there exists 
an algorithm with query and time complexities polynomial in log |r| that tests whether (F, o) is close to an abelian 
group. An open question is to understand for which other classes of groups such a test can be done efficiently and, 
on the other hand, if nontrivial lower bounds can be proved for specific classes of groups. 

Notice that the algorithm in Ref. [10] has been obtained by first constructing a simple quantum algorithm that 
tests in poly(log |r|) time if an input (F, o) is close to an abelian group (based on a quantum algorithm by Cheung 
and Mosca lH computing efficiently the decomposition of a black-box abelian group on a quantum computer), and 
then replacing the quantum part by clever classical tests. One can find this surprising since, classically, computing 
the decomposition of a black-box abelian group is known to be hard [2]. This indicates that, in some cases, new 
ideas in classical property testing can be derived from a study of quantum testers. One can naturally wonder if all 
efficient quantum algorithms testing closeness to a given class of groups can be converted into efficient classical 
testers in a similar way. This question is especially motivated by the fact that Inui and Le Gall ifTTl have constructed 
a quantum algorithm with query complexity polynomial in log |F| that tests whether (F, o) is close to a solvable 
group (note that the class of solvable groups includes all abelian groups), and that their techniques can also be used 
to test efficiently closeness to several subclasses of abelian groups on a quantum computer, as discussed later. 
Our contributions: In this paper we investigate these questions by focusing on subclasses of abelian groups. We 
show lower and upper bounds on the randomized (i.e., non-quantum) query complexity of testing if the input is 
close to a cyclic group, and more generally on the randomized query complexity of testing if the input is close to an 
abelian group generated by k elements (i.e., the class of groups of the form x • • • x where \ <r <k and 
mi, . . . , rrir are positive integers), for any fixed k > 1 and for both the edit distance and the Hamming distance. 
We prove in particular that their complexities vary dramatically according to the value of k and according to the 
assumption that the size of F is known or not. Table [Ugives an overview of our results. 

Table 1 : Lower and upper bounds on the randomized query complexity of testing if (F, o) is close to specific classes 
of groups. Here e denotes the distance parameter, see Section |2]for details. 



Target 


Distance 


Bound 
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group 

abelian group 

cyclic group (size unknown) 
abelian group with k generators 

[k: fixed integer > 1] 
cyclic group (size known) 


edit or Hamming 
edit 

edit or Hamming 
edit or Hamming 
edit or Hamming 


0(|F|3/2) 

0(poly(e-i,log|F|)) 
l^(|F|i/6) 

f7( r 6 6(3fc+l)) 

0(poly(e-i,log|F|)) 


m 
m 

here (Th.IB 
here (Th.© 
here (Th.|3]l 



Our results show that, with respect to the edit distance, testing closeness to subclasses of abelian groups gener- 
ally requires exponentially more queries than testing closeness to the whole class of abelian groups. We believe that 
this puts in perspective Friedl et al.'s work [10 | and indicates both the strength and the limitations of their results. 

The lower bounds we give in Theorems[T]and|2]also prove the first exponential separations between the quantum 
and randomized query complexities of testing closeness to a class of groups. Indeed, the same arguments as 
in Ref. [11] easily show that, when the edit distance is considered, testing if the input is close to an abelian group 
generated by k elements can be done using poly(e~^ , log |F|) queries on a quantum computer, for any value of k and 
even if |F| is unknown. While this refutes the possibility that all efficient quantum algorithms testing closeness to a 
given class of groups can be converted into efficient classical testers, this also exhibits a new set of computational 
problems for which quantum computation can be shown to be strictly more efficient than classical computation. 
Relation with other works: While Ivanyos [12] gave heuristic arguments indicating that testing closeness to a 
group may be hard in general, we are not aware of any (nontrivial) proven lower bounds on the query complexity 
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of testing closeness to a group-theoretic property prior to the present work. Notice that a few strong lower bounds 
are known for related computational problems, but in different settings. Babai (V\ and Babai and Szemeredi lU 
showed that computing the order of an elementary abelian group in the black-box setting requires exponential time 
— this task is indeed one of the sometimes called "abelian obstacles" to efficient computation in black-box groups. 
Cleve Q also showed strong lower bounds on the query complexity of order finding (in a model based on hidden 
permutations rather than on an explicit group-theoretic structure). These results are deeply connected to the subject 
of the present paper and inspired some of our investigations, but do not give bounds in the property testing setting. 
The proof techniques we introduce in the present paper are indeed especially tailored for this setting. 
Organization of the paper and short description of our techniques: Section [3] deals with the case where |r| is 
unknown. Our lower bound on the complexity of testing closeness to a cyclic group (Theorem [U is proven in a 
way that can informally be described as follows. We introduce two distributions of inputs: one consisting of cyclic 
groups of the form Zp2 , and another consisting of groups of the form Zp x Zp, where p is an unknown prime number 
chosen in a large enough set of primes. We observe that each group in the latter distribution is far with respect to 
the edit distance (and thus with respect to the Hamming distance too) from any cyclic group. We then prove that a 
deterministic algorithm with o(|r|^/^) queries cannot distinguish those distributions with high probability. 

Section|4]focuses on testing closeness to the class of groups generated by > 1 elements, and proves Theorem|2] 
in a similar way. For example, when /c > 1 is a fixed odd integer, we introduce two distributions consisting of 
groups isomorphic to Gp = Z^a"''"'^^''^ x ^^^"^ and to Hp = Z^a ^'''^^ x z!^^^^^"^, respectively. Notice that 
Gp and Hp have the same size. While Gp is generated by k elements, we observe that Hp is far from any group 
generated by k elements. We then show that any deterministic algorithm with o(p*^'^~^-'/^) = o(|r|^/^~^/^(^'^"'"^)) 
queries cannot distinguish those distributions with high probability, even if p (and thus |r|) is known. 

Section |5]is devoted to constructing an efficient tester for testing closeness to cyclic groups when the size |r| of 
the ground set is known. The idea behind the tester we propose is that, when the size |r| of the ground set is given, 
we know that if (F, o) is a cyclic group, then it is isomorphic to the group Z|r|. We then take a random element 7 of 
F and define the map / : Z|r| — F by f{i) = 7* for any i G {0, . . . , |F| — 1} (here the powers are defined carefully 
to take into consideration the case where the operation o is not associative). If (F, o) is a cyclic group, then 7 is 
a generating element with non negligible probability, in which case the map / will be a group isomorphism. Our 
algorithm will first test if the map / is close to a homomorphism, and then perform additional tests to check that / 
behaves correctly on any proper subgroup of Z|r|. 

2 Definitions 

Let F be a finite set and o:FxF— )'Fbea binary operation on it. Such a couple (F, o) is called a magma. We 
first define the Hamming distance between two magmas over the same ground set. 

Definition 1. Let (F, o) and (F, *) be two magmas over the same ground set F. The Hamming distance between o 
and *, denoted Hamr(o, *), is Hamr(o, *) = \{{x,y) £TxT\xoy^x* y}\. 

We now define the edit distance between tables. A table of size A; is a function T from IT x IT — N where H is 
an arbitrary subset of N (the set of natural numbers) of size k. We consider three operations to transform a table to 
another. An exchange operation replaces, for two elements a, 6 E FI, the value T{a, h) by an arbitrary element of 
N. Its cost is one. An insert operation on T adds a new element a E N\n: the new table is the extension of T to the 
domain (11 n {a}) x (11 n {a}), giving a table of size {k + 1) where the 2k + 1 new values of the function are set 
arbitrarily. Its cost is 2/c + 1. A delete operation on T removes an element a E Ft: the new table is the restriction of 
T to the domain (n\{a}) x (n\{a}), giving a table of size {k — 1). Its cost is 2k — 1. The edit distance between 
two tables T and T' is the minimum cost needed to transform T to T' by the above exchange, insert and delete 
operations. 

A multiplication table for a magma (F, o) is a table T : 11 x 11 — N of size |F| for which the values are in one-to- 
one correspondence with elements in F, i.e., there exists abijection o" : 11 — )■ F such that T(a, b) = a^^{a{a)oa{b)) 



3 



for any a,b gU. We now define the edit distance between two magmas, whicli will enable us to compare magmas 
with distinct grounds sets, and especially magmas with ground sets of different sizes. This is the same definition as 
the one used in Ref. 1.1 Oil . 

Definition 2. The edit distance between two magmas (F, o) and (F', *), denoted edit((F, o), (F', *)), is the min- 
imum edit distance between T and T' where T ( resp. T') runs over all tables corresponding to a multiplication 
table for (F, o) (resp. (F', *)). 

We now explain the concept of distance to a class of groups. 

Definition 3. Let be a class of groups and (F, o) be a magma. We say that (F, o) is 6-farfrom 'rf with respect to 
the Hamming distance if 

min Hamr(o, *) > 5|F|^. 
*: rxT— >-r 

(r,*) I.V a group in 

We say that (F, o) is 6-farfrom ^ with respect to the edit distance if 

min edit((F,o),(F',*)) > JIFP. 
(r',*) 

(r',*) is a group in 'lo 

Notice that if a magma (F, o) is 5-far from a class of groups with respect to the edit distance, then (F, o) is 
(5-far from with respect to Hamming distance. The converse is obviously false in general. 

Since some of our results assume that the size of F is not known, we cannot suppose that the set F is given ex- 
plicitly. Instead we suppose that an upper bound q of the size of F is given, and that each element in F is represented 
uniquely by a binary string of length [log2 q\ ■ One oracle is available that generates a string representing a random 
element of F, and another oracle is available that computes a string representing the product of two elements of 
F. We call this representation a binary structure for (F, o). This is essentially the same model as the one used in 
Ref. ifTOlfm and in the black-box group hterature (see, e.g., Ref. 121). The formal definition follows. 

Definition 4. A binary structure for a magma (F,o) is a triple {q,Oi,02) such that q is an integer satisfying 
Q > |F|, and Oi, O2 are two oracles satisfying the following conditions: 

(i) there exists an injective map -n from T to S = {0, l}r'°S2 91; 

(ii) the oracle Oi chooses an element x G F uniformly at random and outputs the (unique) string z G S such 
that z = tt{x). 

(Hi) on two strings zi, Z2 in the set '7r(F), the oracle O2 takes the (unique) element x £ T such that x = 7r~^(zi) o 
7r~^(z2) and outputs 7r(x). (The action of O2 on strings in S\7r(F) is arbitrary.) 

We now give the formal definition of an e-tester. 

Definition 5. Let ^ be a class of groups and let e be any value such that < e < 1. An e-tester with respect to 
the edit distance (resp., to the Hamming distance) for ^ is a randomized algorithm such that, on any binary 
structure for a magma (F, o), 

(i) outputs "PASS" with probability at least 2/3 if {T,o) satisfies property 

(ii) £^ outputs "FAIL" with probability at least 2/3 /f (F,o) is e-far from ^ with respect to the edit distance 
(resp., to the Hamming distance). 
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3 A Lower Bound for Testing Cyclic Groups 



Suppose that we only know that an input instance (F, o) satisfies |r| < q, where q is an integer known beforehand. 
In this section, we show that any randomized algorithm then requires fi{q^/^) queries to test whether (F, o) is close 
to the class of cyclic groups. More precisely, we prove the following result. 

Theorem 1. Suppose that the size of the ground set is unknown and suppose that e < 1/23. Then the query 
complexity of any e-testerfor the class of cyclic groups, with respect to the Hamming distance or the edit distance, 
is r2((76). 

Theorem [U is proved using Yao's minimax principle. Specifically, we introduce two distributions of instances 

and such that every instance in is a cyclic group and every instance in is far from the class of 
cyclic groups. Then we construct the input distribution as the distribution that takes an instance from with 
probability 1/2 and from with probability 1/2. If we can show that any deterministic algorithm, given as 
an input distribution, requires il.{q^^^) queries to correctly decide whether an input instance is generated by or 

with high probability under the input distribution, we conclude that any randomized algorithm also requires 
fl{q^^^) queries to test whether an input is close to a cyclic group. 

We now explain in details the construction of the distribution ^. Define q' = [^/q\ and let R be the set of 
primes in {q' /2, . . . ,q'}. From the prime number theorem, we have \R\ = r2(g'/ log g'). We define i^y as the 
distribution over binary structures (g, Oi, O2) for Zp2 where the prime p is chosen uniformly at random from R 
and the injective map vr: Zp2 — >• {0, l}ri°g2'?l hidden behind the oracles is also chosen uniformly at random. We 
define as a distribution over binary structures for in the same manner. Indeed, the order of any instance 
generated by those distributions is at most q. Every instance in ^ is a cyclic group. From Lemma [T] below, we 
know that every instance in is 1/23-far (with respect to the edit distance, and thus with respect to the Hamming 
distance too) from the class of cyclic groups. Its proof is included in Appendix. 

Lemma 1. Let {G, o) and (H, *) be two nonisomorphic groups. Thenedit{{G,o),{H,*)) > ^max{\G\^ ,\H\^). 

In order to complete the proof of Theorem [T] it only remains to show that distinguishing the two distributions 
and is hard. This is the purpose of the following proposition. 

Proposition 1. Any deterministic algorithm that decides with probability larger than 2/3 whether the input is from 
the distribution QJy or from the distribution !3n must use queries. 

Let us first give a very brief overview of the proof of Proposition[T] We begin by showing how the distributions 
and described above can equivalently be created by first taking a random sequence i of strings, and then 
using some constructions and respectively, which are much easier to deal with. In particular, the map vr in 
the constructions and is created "on the fly" during the computation using the concept of a reduced decision 
tree. We then show (in Lemma|2]l a il((7^/^)-query lower bound for distinguishing "^y and 

Proof of PropositionUl Let be a deterministic algorithm with query complexity t. We suppose that t < q, 
otherwise there is nothing to do. The algorithm £/ can be seen as a decision tree of depth t. Each internal node 
in the decision tree corresponds to a query to either Oi or O2, and each edge from such a node corresponds to an 
answer for it. The queries to O2 are labelled as 02(s, s'), for elements s and s' in S = {0, 1} ^i^sa <?! . Each answer 
of a query is a binary string in S. Each leaf of the decision tree represents a YES or NO decision (deciding whether 
the input is from or from ^^r, respectively). 

Since we want to prove a lower bound on the query complexity of £/, we can make freely a modification 
that gives a higher success probability on all inputs (and thus makes the algorithm £/ more powerful). We then 
suppose that, when £/ goes through an edge corresponding to a string already seen during the computation, then 
£/ immediately stops and outputs the correct answer. With this modification, £/ reaches a leaf if and only if it did 
not see the same string twice. We refer to Figure [Tla) for an illustration. 
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S3 S4. 

XP2(S2.S2) 02(33,33)' 



□ d02(s2,S3) 

S132 S3^4 
/ / \ \ 

□ □ 



SI S2S'3 S4 



□ □ □ ^^.,02(34, S3) 

S1S2 S3S4 

□ □ 



02(33, S3)' 



^P.02(S4,S3) 



sjS4^ 



(a) 



(b) 



Figure 1: (a) The decision tree of a deterministic algorithm for q = 4 and S = {si, S2, S3, S4}. A dotted arrow 
means that the computation stops and that the correct answer is systematically output. The leaves are the squared 
nodes, (b) The reduced decision tree associated with the sequence £ = (ss, S4, si, S2). The unseen edges are 
represented by plain arrows. 



We first consider the slightly simpler case where the algorithm £/ only uses strings obtained from previous 
oracle calls as the argument of a query to O2. In other words, we suppose that, whenever an internal node v 
labelled by 02(5, s') is reached, then both s and s' necessarily label some edge in the path from the root of the tree 
to V (notice that this is the case for the algorithm of Figure HJa)). We will discuss at the end of the proof how to 
deal with the general case where £/ can also query O2 on strings created by itself (e.g., on the all zero string or on 
strings taken randomly in S). 

Let us fix a sequence £ = {cji, . . . ,crm) of distinct strings in S. Starting from the root u of the decision 
tree (located at level i = 1), for each internal node located at level i G {1, ... we only keep the outgoing 
branches labelled by strings ai, . . . ,ai, and we call the edge corresponding to ai an unseen edge (remember that 
i < 9 < This construction gives a subtree of the decision tree rooted at u that we call the reduced decision 
tree associated with t. Note that this subtree has exactly one leaf. See Figure dlb) for an illustration. 

Let us fix p G i? and let G be either Zp2 or with the group operation denoted additively. We now describe 
a process, invisible to the algorithm .s/ , which constructs, using the sequence I, a map -n- : G — )• S defining a 
binary structure {q, Oi, O2) for G. The map -tt is constructed "on the fly" during the computation. The algorithm 
starts from the root and follows the computation through the reduced decision tree associated with £. On a node 
corresponding to a call to Oi, the oracle Oi chooses a random element x of the group. If this element has not 
already appeared, then 7r(x) is fixed to the string of the unseen edge of this node. The oracle Oi outputs this 
string to the algorithm £^ , while x is kept invisible to If the element x has already appeared, then the process 
immediately stops — this is coherent with our convention that £if stops whenever the same string is seen twice. 
On a node corresponding to a call to 02{s, s'), the elements x and x' such that ■k{x) = s and tt{x') = s' have 
necessarily been already obtained at a previous step from our assumption. If the element x -\- x' has not already 
appeared, then tt{x + x') is fixed to the string of the unseen edge of this node. Otherwise the process stops. By 
repeating this, the part of the map vr related to the computation (i.e., the correspondence between elements and 
strings for all the elements appearing in the computation) is completely defined by £ and by the elements chosen 
by the oracle Oi. If necessary, the map vr can then be completed. On the example of Figure (Hb), if the input is 
Z4 = {0,1,2,3} and Oi chooses the element 3, then the path followed is the path starting from the root labelled 
by S3, S4, si which defines 7r(3) = S3, 7r(2) = S4, and 7r(l) = si. 

For a fixed sequence £, let '^y (resp. be the Ion the fly" construction for Zp2 l^resp. Z^) obtained by first 
choosing p uniformly at random from R, and then defining vr while running the algorithm, as detailed above. The 
distribution (resp. ^at) coincides with the distribution that takes a sequence £ = (cti, . . . , of |S| strings in 
S uniformly at random without repetition and then create binary structures {q, Oi , O2) using 'toy (resp. '^^). Thus, 
to prove Proposition [T] it suffices to use the following lemma. 
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Lemma 2. Let t be any fixed sequence o/ 1 S | distinct strings in Tj. If decides correctly with probability larger 
than 2/3 whether the input has been created using using 'rf^, then t = n{q^/^). 

Proof of Lemma\2\ Let wi, . . . , t;„ be the set of nodes in the reduced decision tree associated with £, and let 5 C 
{1, . . . , n} (resp., T C {1, . . . , n}) be the set of indexes i such that Vi is a query to O2 (resp., to Oi). Notice that 
\S\ + \T\ < t. For each index j G T, we set aj as a random variable representing the element chosen by Oi at 
node Vj. Here, aj G Zp2 when generates Zp2, and aj G when generates Z^. Since only additions 
are allowed as operations on the set {aj}j,=T, the output to a query Vi for z G S can be expressed as 7r(aj) where 
Oj = J2jeT ^j'^j ^ linear combination of the variables in {aj}j^T- Here all coefficients kj are non-negative and 
at least one coefficient must be positive. 

We define the function an' = ai — = J^jeT^^ ~ ^] every i ^ i' ^ S. Without loss of generality, 

we assume that each aui is a nonzero polynomial (i.e., there exists at least one index j such that /c* / A;*'). This 
is because, otherwise, the element (and the string) appearing at node Vi is always the same as the element (and the 
string) appearing at node Vi' , and thus one of the two nodes vi and Vi' can be removed from the decision tree. For 
any positive integer m, we say that aai is constantly zero modulo m if m divides ¥■ — kj for all indexes j G T. We 
say that a prime p G -R is good if there exist i ^ i' ^ S such that the function an' is constantly zero modulo p. We 
say that p G is bad if, for all i ^ i' ^ S, the function an' is not constantly zero modulo p (as shown later, when 
p is bad, it is difficult to distinguish if the input is Zp2 or Z^). We denote by Rci^) ^ R the set of good primes. 

We first suppose that \RG{i)\ > \R\/Q- Let M denote the value j^pyj^. Assume the existence of a subset 
R'ci^) ^ Rg{^) of size > M such that there exist i / i' G 5 for which ajj/ is constantly zero modulo 

p for every p G R'ci'^)- Since all p G R'g{^) ^re primes, and an' is not the zero-polynomial, aai must have a 
nonzero coefficient divisible by Hpgi?^ To create such a coefficient, we must have t > ^og2Y[p^R'^{e) P = 
^1{\R'q{£)\ logq') = ^{{\R\ logq'y/^). Now assume that there exists no such subset RQ{i)- Then, for each i 7^ 
i' G S, at most M primes p have the property that aw is constantly zero modulo p. This implies that \Rg{£)\ < 
M-|5|(|5| -l)/2 < M-t{t-l)/2. Since ji^cWI > 1^1/6, it follows that t = log g')^/^)- Thus, for both 

cases, we have t = n{{\R\ logg')^^^) = ^{q^^^)- 

Hereafter we suppose that \RG{i)\ < \R\/Q- Assume that the leaf of the reduced decision tree corresponds to a 
YES decision. Recall that, if the computation does not reach the leaf, £/ always outputs the correct answer. From 
these observations, we give the following upper bound on the overall success probability: 

r+(l-r)(p(-l + (l-p(-)-l) r + {l-r){p%-l + {l-p%)-0) _ l + r + {l-r)p% 
2 2 2 ' 

where r = ^^^^^^^ is the probability of p being good, and py (resp., p^) is the probability that s;/ does not reach 
the leaf conditioned on the event that the instance is from (resp., from 't^^) and p is a bad prime. Since 
\Rg{^)\ < l-R|/6, the above success probability has upper bound ^ + j^yP^N- When the leaf of the reduced 
decision tree corresponds to a NO decision, a similar calculation gives that the overall success probability is at 
most ^ + j2Py- 

We now give an upper bound on py and . Let us fix p G R\ Rg {(■) ■ Since p is bad, each aai for i 7^ i' G S* is 
not constantly zero modulo p. When ^y generates Zp2 , the probability that an' becomes after substituting values 
into {aj}j^T is then exactly l/p^ (since the values of each aj uniformly distribute over Zp2 and there is a unique 
solution in Zp2 to the equation an' = once all but one values are fixed). By the union bound, the probability 

Py thus satisfies py < "^^ 2^^^"*^^ — * 2p^^^ — ^ ' 17)^ ■ Similarly, when generates Zp, the probability that 
Qui becomes after substituting values into {aj}j(zT is also exactly Thus, the probability also satisfies 

To achieve overall success probability at least 2/3, we must have either py > 1/5 or p^ > 1/5, and thus 
t = n{q') = n{q^/^). □ 
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Finally, we briefly explain how to deal with the general case where can make binary strings by itself and use 
them as arguments to O2. The difference is that now a string s not seen before can appear as an argument to 02- 
Basically, what we need to change is the following two points: First, in the "on the fly" construction of vr from £, if 
such a query appears then an element x is taken uniformly at random from the set of elements of the input group 
not already labelled, and the identification ■k{x) = s is done. Second, in the proof of Lemma |2l another random 
variable is introduced to represent the element associated with s. With these modifications the same lower bound 
t = ^{q^l^) holds. 

This concludes the proof of Proposition [T] □ 

4 A Lower Bound for Testing the Number of Generators in a Group 

In this section we show that, even if the size of the ground set F is known, it is hard to test whether (F, o) is close 
to an abelian group generated by k elements for any value k > 2. We prove the following theorem using a method 
similar to the proof of Theorem [T] See Appendix for details. 

Theorem 2. Let k >2 be an integer and suppose that e < 1 /23. Then the query complexity of any e-testerfor the 
class of abelian groups generated by k elements is 

fil(|Fp~6(3*^+2)) if k is even, 
\n{\T\^~whT)) if k is odd. 

Moreover, these bounds hold with respect to either the Hamming distance or the edit distance, and even when |F| 
is known. 

5 Testing if the Input is Cyclic when |r| is Known 

In this section we study the problem of testing, when |F| is known, if the input (F, o) is a cyclic group or is far from 
the class of cyclic groups. Let us denote m = |F|, and suppose that we also know its factorization m = p^^ ■ ■ ■ p^^ 
where the pj's are distinct primes. Let Cm = {0, . . . , m — 1} be the cyclic group of integers modulo m and, for 
any i G {1, . . . , r}, denote by Cm,i = {0, ^, . . . , (pj — 1)^} its subgroup of order pi. The group operation in Cm 
is denoted additively. 

For any 7 G F, we now define a map : Cm T such that /^(a) represents the a-th power of 7. Since the 
case where o is not associative has to be taken in consideration and since we want to evaluate efficiently /, this map 
is defined using the following rules. 

f-y{a) =70 /(a — 1) if2<a<m — 1 and a is odd 

/^(a) = f-y{a/2) o /^(a/2) if2<a<m — 1 and a is even 
^ /^(0)=7o/(m-l) 

The value of f-y{a) can then be computed with 0(log m) uses of the operation o. Notice that if (F, o) is a group, 
then /7(a) = 7^^ for any a G {0, . . . , m — 1}. 

For any e > 0, our e-tester for cyclic groups is denoted CyclicTesTe and is described in Figure |2] The input 
(F, o) is given as a binary structure (g, Oi, O2) with q > m. \n the description of Figure |2j operations in (F, o), 
such as taking a random element or computing the product of two elements, are implicitly performed by using 
the oracles Oi and O2. The correctness of this algorithm and upper bounds on its complexity are shown in the 
following theorem. A proof is given in Appendix. 
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Algorithm CyclicTest^ 

INPUT: a magma (F, o) given as a binary structure {q, Oi, O2) 
the size m = |r| and its factorization m = p^^ • • ■p^'' 

1 decision FAIL; counter ^ 0; 

2 while decision = FAIL and counter < di = G (log log m) do 



3 decision ^ PASS; 

4 Take an element 7 uniformly at random in F; 

5 Repeat the following test ^2 = 0(e^^ log log log m) times: 

6 take two elements x, y uniformly at random in Cm', 

7 if fy{x + / f^{x) o f^{y) then decision ^ FAIL; 

8 fori G {1,... ,r} do 

9 take two arbitrary distinct elements x,y in Cm.i', 

10 take ^3 = 0(log log log m) elements ui, . . . , Ud^ at random in Cm\ 

11 if there exists j G {1, . . . , ^3} such that f^{x + Uj) = f^{y + uj) 

12 then decision ^ FAIL; 

13 counter ^ counter +1; 



14 output decision; 

Figure 2: Algorithm CyclicTest^. 

Theorem 3. For any value e > 0, Algorithm CYCLlcTESTe is an e-tester for cyclic groups with 
respect to both the edit distance and the Hamming distance. Its query and time complexities are 
O ({log m + I^sMH) . log g . log log log m) . 
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Appendix 



A. Proof of Lemma [T] 

The idea of this proof has been communicated to us by Ivanyos [ 13 1. Work on other aspects of the distance between 
non-isomorphic groups has subsequently been the subject of a joint paper llT4l . 

We will use the following lemma, which is a weak version of Corollary 1 in Ref. llT4l . 

Lemma 3. Let {G, o) and {H, *) be two groups such that \G\ < \H\. If (G, o) is not isomorphic to a subgroup of 
{H, *), then 

Pr [7(2; oy) = 7(x) * 7(7/)] < -|G|^ 

x,yGG y 

for any injective map ^ : G ^ H. 

We now present our proof of Lemma [T] 

Proof of LemmaU} We assume without loss of generality that |G| < \H\ and prove the lemma by contraposition. 
Namely, we show that G and H are isomorphic if edit((G, o), [H, *)) < \H\'^/23. 

Suppose that edit((G,o),(iJ,*)) < 6\H\^, where 6 < 1/23. LetT^: UgxUg N mdTn : Uh xUh ^ N 
be multiplication tables of G and H, respectively, such that the edit distance between Tq and Th is at most p. 
Here, Hq and Hh are subsets of N of size |G| and \H\, respectively. Let ac '■ ~^ G and au '■ H be the 

bijections associated with Tq and Th, respectively. 

First notice that \G\ > {1 — 6)\H\. Otherwise, at least 6\H\ elements should be added to Tq to obtain the table 
Tff, which would cost at least 

S\H\ 

^{2\H\ - 2i + l) = 26\H\^ - S\H\{6\H\ + 1) + 6\H\ = 6{2 - 6)\H\'^ > 6\H\'^ 

i=l 

operations. 

We now consider the transition from Tq to Th through the process of computing the edit distance. Observe 
that the number of removed elements through the transition is at most 6\G\, otherwise it would cost more than 

<5|G| 

J^(2|G| -2i + l) = 26\G\^ - 6\G\{6\G\ + I) + S\G\ 

i=l 

= 6(2 - 6)\G\^ > 6{2 - 6){1 - dflHl"^ > 6\H\'^ 

operations. Let S C Hq be the set of elements that are not removed in the transition and define U = {(Tg{s)\s G 
S} C G. From the argument above, we have \U\ > {1 — 6)\G\. 

We define a map f : G ^ H as follows. For x E U, f{x) = anicTQ^ {x)). For x ^ U, we choose f{x) so 
that f{x) becomes an injective map (this is possible since |G| < \H\). Suppose that, for two elements x,y G U, 
the element x o y is in [/. Also, suppose that the value Tg(<T(^^(x), (T(^^(y)) was not modified in the transition, i.e., 
TGia^Hx),(Ta\y)) = TH{a^Hx),a^\y)). In this case, 

aj,Hfi^)*fiy)) = TH{aj,Hf{x)),a-^\f{y))) 
= TH{(7^^ix),a^^{y)) 

= TG{c7^\x),a^Hy)) 
= cTa^ixoy). 
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Thus, we have f{x) * f{y) = aui^^Q {x o y)) = f[x o y). Since the number of exchange operations done to the 
table Tg is at most 5\H\'^ < — 6)"^, by the union bound we obtain 

Pr [/(x o y) = fix) * f{y)] >l-35- S/{1 - 5f > 1 - 55. 

x,yeG 

Thus, since 56 < 2/9, Lemma |3] implies that the group {G, o) is isomorphic to a subgroup of (H, *). If {G, o) is 
isomorphic to a proper subgroup of {H, *), then \G\ < \H\/2, which contradicts the fact that |G| > (1 — 
Thus, (G, o) is indeed isomorphic to {H, *). □ 

B. Proof of Theorem |2] 

To show the lower bound, we use Yao's minimax principle as in the proof of Theorem [T] We introduce two 
distributions i^y and such that every instance in i^y is generated by k elements while every instance in is 
far from abelian groups generated by k elements. Moreover, all instances in ^y and have the same order. Then 
we construct the input distribution & as the distribution that takes an instance from ^y with probability 1/2 and 
from with probability 1/2. By showing that any deterministic algorithm requires many queries to distinguish 
them, we obtain the desired result. 

We first consider the case where k is even. Let r > 2 be a fixed integer and denote k = 2r — 2. For any fixed 
(and known) prime p, we define i^y as the distribution over binary structures for the group Z^2 x Z^"^ where the 
injective map vr hidden behind the group oracles is chosen uniformly at random. We define as the uniform 
distribution over binary structures for Z^2^^ x in the same manner. The order of every instance in ^y and is 
p'^^'~'^. Every instance in ^y has 2r — 2 = k generators while every instance in &n needs at least 2r — 1 = k + 1 
elements to be generated. Moreover, from Lemma [T] every instance in is 1/23-far from groups of k generators. 
The part of Theorem |2]for k even then follows from the following proposition. 

Proposition 2. Any deterministic algorithm that decides with probability larger than 2/3 whether the input is from 
the distribution Sly or from the distribution must use Q^y'jf^) queries. 

Proof. Let us consider the decision tree associated with a deterministic algorithm £/ using t queries. As in Section 
[3l we rely on the fact that the distribution of instances generated by ^ can be created through a more convenient 
"on the fly" construction of vr using a random sequence i of strings. We suppose hereafter that i is fixed and denote 
by (resp., 'if^) the associated construction of positive (resp., negative) instances. We assume again that, when 
£/ goes through an edge corresponding to a string already seen during the computation, then £/ immediately stops 
and outputs the correct answer (this modification only improves the ability of £/). 

We denote again by vi, . . . ,Vn the set of nodes in the reduced decision tree associated with i, and by S C 
{1, . . . ,n} (resp., T C {1, . . . ,n}) the set of indexes i such that Vi is a query to O2 (resp., Oi). Notice that 
15*1 + I T| < t. For each j £ T, we set aj as a random variable representing the element obtained by performing 
a query to Oi. The answer to a query Vi for i G S can be expressed as 7r(aj) where a-i = J2j£T ^ linear 

combination of the variables {aj}j^T- We define the function aui = Ui — ai' = Ylj&ri^) ~ for every 

i i' £ S. Remember that, for any positive integer m, we say that au' is constantly zero modulo m if m divides 
kj — kj for all indexes j £ T. Note that we can suppose without loss of generality that for all indexes i ^ i' £ S 
the function a^j/ is not constantly zero modulo p^ (otherwise it would give no useful information since p^x = for 
any element x in an instance created by "^y or ^^). 

Suppose that the leaf of the reduced decision tree associated with I corresponds to a YES decision. The success 
probability of the algorithm £^ for this fixed sequence I is at most 

l(^|..l + (l_/^).l) + l(p^^.l + (l_^^^).0) = l(l + /^), 

where py (resp., p^) is the probability that £^ does not reach the leaf conditioned on the event that the instance 
is from (resp., from ^^). When the leaf of the reduced decision tree corresponds to a NO decision, a similar 
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calculation gives that the success probability is at most ^{1 + py)- Notice that py and are the probabilities that 
the same string is seen twice during the computation. We will now show that, when the instance is created by either 
or "^AT' the inequality 



Pr 



3iy^i'GS such that ^ kfaj = 



< 



2-p 



r-l 



holds. This implies that max(/3y,/3^) < |^^?4t and then the algorithm £/ cannot distinguish 'rfy from 'rf^ with 

probability at least 2/3 unless t = 

Let us fix some pair of indexes i / i' G S*. If there exists some index j ^ T such that kj- ^ (mod p), then 
for instances generated by 'tfy and 'tf^ we have 



Pr 

{aj}j<=T 



On 







1 



p 



,3r-2 ■ 



(1) 



Now suppose that = (mod p) for all j G T. Since there are p^^ ^ elements of order at most p in Z^'a 



and p^^ ^ elements of order at most p in I/" 



I, for instances generated by '^y and.'^i, we have 



Pr 



< 



p. 



,3r-2 



-1 ■ 



(2) 



The union bound then implies that 



Pr 



e S such that ^ fcj^'aj = 

j6T 



< 



2 •p'"-! 



in both cases. 

Since the same argument holds for any sequence I, we conclude that the algorithm 
from with overall success probability at least 2/3 unless t = 17 (yT/^). 



cannot distinguish 

□ 



We now consider the case where k is odd. Let us fix r > 2 and denote k = 2r — 1. We define similarly &y 



as the uniform distribution over binary structures for the group W 



yr—l 



, and 



as the uniform distribution 



over binary structures for Z^j 



is p^"^ ^. Every instance in 



X Zp+^. The order of every instance in and 
has 2r — 1 = k generators while every instance in needs at least 2r = k -\- 1 elements to be generated. From 
Lemma[T] every instance in is 1/23-far from abelian groups generated by k generators. The part of Theorem |2] 
for k odd follows from the following proposition. 

Proposition 3. Any deterministic algorithm that decides with probability larger than 2/3 whether the input is from 
the distribution ^y or from the distribution must use Q{y^p^'~^) queries. 

Proof. The proof is exactly the same as the proof of Proposition |2] except that Equality (O becomes 



Pr 



1 



P 



,3r-l 



and Inequality Q becomes 



Pr 



E 



kf a.j = 



< 



p 



2r 



□ 
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C.Proof of Theorem H 



The proof of Theorem |3]relies on the following theorem. 

Theorem 4. Let (r,o) be a magma and let r] be a constant such that rj < 1/120. Let G be a (not necessary 
abelian) group with order \G\ = \T\ in which the multiplication of two elements x,y is denoted by xy. Let f denote 
a map from G to T. Suppose that the following two conditions are satisfied: 

(a) Y>T,,y^G[f{xy) = fix) o f{y)] > 1 - ??; 

(b) for any subgroup H ^ {e} of G there exist two distinct elements x,y G H such that the inequality 
PTueoifixu) = fiyu)] < 1/2 holds. 

Then there exists a binary operation * : F x F — t- F such that (F, *) is a group isomorphic to G and such that 
Hamr(o,*) < 46r/|G|2. 

We need an auxihary lemma to prove Theorem |4] 

Suppose that (F, o) is a magma, rjis a. constant such that < r] < 1/120, G is a (not necessary abelian) group, 
and / is a map from G to F. The order of G does not matter for now. The multiplication of two elements x,y ^ G 
is denoted by xy. Following definitions introduced in the work by Friedl et al. lITOl . we say that an element x of G is 
wZZ-Z7e/iavmg ifboth the two inequalities PrttgG'[/(a;n) = f{x)of[u)] > 4/5 and PrMgG'[(/(x)o/(ti))o/(n^-'^) = 
f{x)] > 4/5 hold. Friedl et al. showed the following results. 

Lemma 4 (Lemmas 1-6 of [10]). Suppose that 

Pr [/(xy) = fix) o f{y)] >l-n. (3) 

x,y€G 

Then Pixeclx is not well-behaving] < Ibrj. Moreover, there exists a normal subgroup K of G such that, for any 
x,y e G: 

(i) ifKx = Ky then Y>Yu(iG[f {xu) = f{yu)] > 1 - 4??; 

(ii) ifKx / Ky then Pruf^cifixu) = f{yu)] < 4??; 

(Hi) f{x) 7^ f{y)for any two well-behaving elements x and yofG such that Kx ^ Ky. 

We now give the proof of Theorem H] The idea is similar to the one used in the proof of Theorem 2 in Ref. lITOll . 

Proof of Theorem^ Suppose that all the conditions of Theorem |4] are satisfied. We explicitly construct a binary 
operation * : F x F — )• F such that (F, *) is isomorphic to G and such that the Hamming distance between (F, o) 
and (f, *) is at most 4677|Gp. 

Let K denote the subgroup of G whose existence is ensured by Lemma |4l From the properties of K stated in 
Lemma lU and from Condition (b) in the statement of Theorem |4] we conclude that K = {e}. 

Let Fi = {/(x) I X is a well-behaving element of G} C F and define F2 = F\Fi. Notice that |Fi| is equal to 
the number of well-behaving elements of G from Lemma |4] 

We now define a one-one map / : G — >• F as follows. If x G G is well-behaving, then /(x) = /(x); if x G G 
is not well-behaving then /(x) is an element in F2 chosen arbitrarily in a way such as /(x) / f{y) for distinct not 
well-behaving elements x, y of G. 

We define the multiplication * over F as follows. For any a, /3 G F, there exist (unique) Xa and x^ in G such 
that a = f{xa) and /3 = f{xp). We then set a * /3 = f{xaXp). With this definition, the map / becomes an 
isomorphism from G to (F, *). 

We now show the following inequality: 

Pr [/(a;) * f{y) + fix) o f{y)] < 46r,. (4) 
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By definition of *, we have f{x) * f{y) = f{xy). With probability at least 1 — A5r] the three elements x, y, and 
xy are well-behaving elements (from Lemma@]i, in which case f{x) = f{x), f{y) = f{y), and f{xy) = f{xy). 
Remember that we also know that with probability at least 1 — 77 the equality f{xy) = f{x)o f[y) holds. Then the 
equality f{x) * f{y) = f{x) o /(y) holds with probability at least 1 — A6rj. 

Since / is one-one from G to F, Inequality (HI) implies that Hamr(o, *) < 467/|rp. □ 

We are now ready to give the proof of Theorem |3] 

Proof of Theorem\3\ Since any e-tester with respect to the Hamming distance is also an e-tester with respect to the 
edit distance, we consider hereafter the Hamming distance. 

Suppose that the input (F, o) is a cyclic group of order m. Suppose that the element 7 chosen at Step 4 is a 
generator of (r,o). Then Pr^.^eC™ [^(a; + v) = f-y{x) o fj{y)] = 1 and FvueaJf-ri^ + u) = f^{y + u)] = 
for any i G {1, . . . , r} and any distinct x, 7/ G Cm,i- Thus the value of the variable decision at the end of the loop 
of Steps 3-13 for this specific value of 7 will always be PASS. Since with probability $7(1/ log log m) an element 
chosen uniformly at random in a cyclic group of order m is a generator (see for example Ref. ||3]), by taking an 
appropriate value di = G(log log m) the algorithm outputs PASS with probability at least 2/3. 

Now suppose that (F, o) is e-far from the class of cyclic groups and let 7 be any element of F. Denote e = 
min(e, 46/120) and suppose that the following two assertions hold: 

(i) Pr.,yeCr. [f^{x + y) = f^{x) o /^(y)] > 1 - ~e/46; 

(ii) for each index i G {1, . . . , r}, there exist two distinct elements x,y £ Cm,i such that 
Pr«ec™ [Mx + u) = f^{y + u)] < ^. 

Notice that any nontrivial subgroup H of Cm contains at least one of the subgroups Cm,i, • • • , Cm,r- Then Theo- 
remHJimplies that (F, o) is e-close (and thus e-close) to the class of cyclic groups, which contradicts our hypothesis. 

We conclude that, when (F, o) is e-far from the class of cyclic groups, for each value 7 chosen by the algorithm 
at Step 4, at least one among Assertion (i) or Assertion (ii) should not hold. If Assertion (i) does not hold for a 
specific value 7, then this is detected with probability at least 1 — (1 — e/46)'^2 in the tests performed at Steps 5-7. If 
Assertion (ii) does not hold for a specific value 7, then there exists a value iq G {1, . . . , r} such that FiueCm [f'yix+ 
^) = f-yiy + u)] > ^ for all distinct x,y G Cm,io- This is detected with probability at least 1 — (1/2)'^^ in the 
tests performed at Steps 8-12. By taking appropriate values d2 = Q{e^^ logdi) = 0(e~^ logloglogm) and 
da = 0(logdi) = (log log log m), the fact that Assertion (i) or Assertion (ii) not hold will be detected with 
overall probability at least 2/3 for all the values of 7 chosen by the algorithm. Algorithm CYCLlcTESTe then 
outputs FAIL with probability at least 2/3. 

The query complexity follows from the fact that fy can be evaluated using 0(log?7i) queries and from the 
observation that r = O (log m/ log log m) since an integer n has at most O (log n/ log log n) distinct prime divi- 
sors (see for example Ref. [3 1). The time complexity follows from the fact that, additionally, elements of F are 
represented by strings of length [log2 q] ■ □ 
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